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Abstract 

We consider the problem of estimating the slope parameter in circular functional 
linear regression, where scalar responses Yi,...,Y n are modeled in dependence of 1- 
periodic, second order stationary random functions Xi, . . . , X n . We consider an or- 
thogonal series estimator of the slope function /3, by replacing the first to theoretical 
coefficients of its development in the trigonometric basis by adequate estimators. Wc 
propose a model selection procedure for to in a set of admissible values, by defining a 
contrast function minimized by our estimator and a theoretical penalty function; this 
first step assumes the degree of ill posedness to be known. Then we generalize the pro- 
cedure to a random set of admissible m's and a random penalty function. The resulting 
estimator is completely data driven and reaches automatically what is known to be the 
optimal minimax rate of convergence, in term of a general weighted L 2 -risk. This means 
that we provide adaptive estimators of both (3 and its derivatives. 

Keywords: Orthogonal series estimation; model selection; derivatives estimation; 

mean squared error of prediction; minimax theory. 
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1 Introduction 

Functional linear models have become very important in a diverse range of disciplines, in- 
cluding medicine, linguistics, chemometrics as well as econometrics (see for instance Ramsay 
and Silverman [2005] and Ferraty and Vieu [2006], for several case studies, or more specific, 
Forni and Reichlin [1998] and Preda and Saporta [2005] for applications in economics). 
Roughly speaking, in all these applications the dependence of a response variable Y on the 
variation of an explanatory random function X is modeled by 

Y=f p{t)X(t)dt + ae, a > 0, (1.1) 
Jo 

for some error term e. One objective is then to estimate nonparametrically the slope function 
(3 based on an independent and identically distributed (i.i.d.) sample of (Y,X). 
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In this paper we suppose that the random function X is taking its values in L 2 [0, 1], 
which is endowed with the usual inner product (•, •) and induced norm ||-||, and that X has 
a finite second moment, i.e., E||X|| 2 < oo. In order to simplify notations we assume that 
the mean function of X is zero. Moreover, the random function X and the error term e 
are uncorrelated, where e is assumed to have mean zero and variance one. This situation 
has been considered, for example, in Cardot et al. [2003], Miiller and Stadtmiiller [2005] or 
most recently James et al. [2009]. Then multiplying both sides in (1.1) by X(s) and taking 
the expectation leads to 

g(s) := E[YX(s)} = f f3(t) cov (X(t),X(s))dt =: \T0\(s), s G [0, 1], (1.2) 
J o 

where g belongs to L 2 [0, 1] and T denotes the covariance operator associated to the random 
function X. We shall assume that there exists a unique solution (3 G L 2 [0, 1] of equation 
(1.2). Estimation of (3 is thus linked with the inversion of the covariance operator T and, 
known to be an ill-posed inverse problem (for a detailed discussion in the context of inverse 
problems see chapter 2.1 in Engl et al. [2000], while in the special case of a functional linear 
model we refer to Cardot et al. [2003]). 

In this paper we consider a circular functional linear model (defined below), where the 
associated covariance operator T admits a spectral decomposition {\j,ipj,j ^ 1} given by 
the trigonometric basis {<fij} as eigenfunctions and a strictly positive, possibly not ordered, 
zero-sequence A := (Xj)j^i of corresponding eigenvalues. Then the normal equation can be 
rewritten as follows 

OO r I 

P = J2\'W with Mi := <S> ^')' 3 > 1- (1-3) 
i=i Xj 

For estimation purpose, we replace the unknown quantities gj and Xj in equation (1.3) by 
their empirical counterparts. That is, if (Yi,X\), . . . , (Y n ,X n ) denotes an i.i.d. sample of 
(Y, X), then for each j ^ 1, we consider the unbiased estimator 

1 n 1 n 

[g]j -=-Y, Yi Mi' and A i := - Y)- Xi & with Mi := M> <Pj) 

i=l i=l 

for [g]j and Xj respectively. The orthogonal series estimator (5 m of (5 is then defined by 

m ^ 

X l :=J2&-l(\ j >l/n}-<p j . (1.4) 

Note that we introduce an additional threshold 1/n on each estimated eigenvalue Xj, since 
it could be arbitrarily close to zero even in case that the true eigenvalue Xj is sufficiently far 
away from zero. Moreover, the orthogonal series estimator keeps only m coefficients; this is 
an alternative to the popular Tikhonov regularization (c.f. Hall and Horowitz [2007]), where 
in (1.3) the factor 1/Xj is replaced by Xj/(a + X 2 ). Thresholding in the Fourier domain 
has been used, for example, in a deconvolution problem in Mair and Ruymgaart [1996] or 
Neumann [1997] and coincides with an approach called spectral cut-off in the numerical 
analysis literature (c.f. Tautenhahn [1996]). 
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In this paper we shall measure the performance of an estimator (3 of (3 by the ^,-risk, 
that is E||/3 — where for some strictly positive sequence of weights uj := (ujj)j^t 

oo 

\\f\\l--=J2"Mf^i)\ 2 for all /GL 2 [0,1]. 
i=i 

This general framework allows us with appropriate choices of the weight sequence uj to cover 
the estimation not only of the slope parameter itself (c.f. Hall and Horowitz [2007]) but 
also of its derivatives as well as the optimal estimation with respect to the mean squared 
prediction error (c.f. Cardot et al. [2003] or Crambes et al. [2009]). For a more detailed 
discussion, we refer to Cardot and Johannes [2009]. It is well-known that the obtainable 
accuracy of any estimator in terms of the .T-^-risk is essentially determined by the regularity 
conditions imposed on both the slope parameter (3 and the eigenvalues A. In the literature 
the a-priori information on the slope parameter (3 such as smoothness is often characterized 
by considering ellipsoids (see definition below) in L 2 [0, 1] with respect to a weighted norm 
||-|| for a pre-specified weight sequence 7. Moreover, it is usually assumed that the sequence 
A of eigenvalues of V has a polynomial decay (c.f. Hall and Horowitz [2007] or Crambes 
et al. [2009]). However, it is well-known that this restriction may exclude several interesting 
cases, such as an exponential decay. Therefore, we do not impose a specific form of a decay. 

It is shown in Johannes [2009] that the estimator (3 m given in (1.4) is optimal in a 
minimax sense if the parameter m = m(n) is appropriately chosen. Roughly speaking, the 
introduction of a dimension reduction implies a bias in addition to the classical variance 
term which leads the statistician to perform a compromise. The optimal choice of the 
dimension parameter m requires an a-priori knowledge about the sequences 7 and A, which 
is unknown in practice. However, useful elements of this previous work are recalled in 
Section 2. 

Our aim in this paper, is to provide a data driven method to select the dimension 
parameter m, in such a way that the bias and variance compromise is automatically reached 
by the resulting estimator. The methodology is inspired by the works of Barron et al. [1999], 
now extensively described in Massart [2007] whose results, like ours, are in a non asymptotic 
setting. By re-writing the estimator (3 m as a minimum contrast estimator over the function 
space S m — called model — linearly spanned by </?i, . . . , ip m , we can propose a model selection 
device by defining a penalty function. We obtain a selected rh in an admissible set of values 
of m. We first define and study in Section 3, the resulting estimator (3^ with deterministic 
penalty and deterministic set of admissible m's: this requires to assume that the degree of 
ill-posedness of the problem is known. In other words, information are first supposed to be 
available about the order of the decay of the eigenvalues A,-. This study gives the tools to 
the next and final step: we define in Section 4 a completely data driven estimator, built by 
using a random penalty function and a random set of admissible dimensions m. We can 
provide a general risk bound for this estimator and show that it can automatically reach 
the optimal rate of convergence, without requiring any a-priori knowledge. All proofs are 
gathered in the Appendix section. 

2 Background to the methodology. 

2.1 Notations and basic assumptions 

Circular functional linear model. In this paper we suppose that the regressor X is 
1-periodic, that is X(0) = X(l), and second order stationary, i.e., there exists a positive 
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definite covariance function c : [—1,1] — ► M such that cov(X(t),X(s)) = c(t — s), s,t £ [0, 1]. 
Then it is straightforward to see that the covariance function c(-) is 1-periodic too. In 
this situation applying the covariance operator T equals a convolution with the covariance 
function. Since c(-) is 1-periodic it is easily seen that due to the classical convolution 
theorem, the eigenfunctions of the covariance operator T are given by the trigonometric 
basis 

<pi(s) := 1, (f2k(s) := \/2cos(27i7cs), ip 2 k+i(s) ■= V? sm(2irks), s £ [0,1], k > 1 
and the corresponding eigenvalues satisfy 

Ai = / c(s)ds, \2k = A2fc+i = / cos(2nks)c(s)ds, k ^ 1. 
Jo Jo 

Notice that the eigenfunctions are known to the statistician and only the eigenvalues depend 
on the unknown covariance function c(-), i.e., have to be estimated. 



Moment assumptions. The results derived below involve additional conditions on the 
moments of the random function X and the error term e, which we formalize now. Let X be 
the set of all centered 1-periodic and second order stationary random functions X £ L 2 [0, 1] 
with finite second moment, i.e., E||X|| 2 < oo, and strictly positive covariance operator T. 
If A := (Xj)j-^i denotes the sequence of eigenvalues associated to T, then given X £ X the 
random variables {[X]j/y^Xj, j £ N} are centered with variance one. Here and subsequently, 
we denote by Xfi, k £ N, rj ^ 1, the subset of X containing only random functions X such 
that the k-th moment of the corresponding random variables [X]j/y/~Aj, j £ N are uniformly 
bounded, that is 



X* := ix £ X such that supE [X\j/y/\j ^ rj\. 



It is worth noting that in case X £ X is a Gaussian random function the corresponding 
random variables [X]j/y/~\j, j £ N, are Gaussian with mean zero and variance one. Hence, 
if r] ^ 3 then any Gaussian random function X £ X belongs also to X^ for each k £ N. 



Minimal regularity conditions. Given a strictly positive sequence of weights w := 
(uij)j^i, denote by J 7 ^ the ellipsoid with radius c > 0, that is, 

oo 

K := {/ £ L 2 [0,1] : X>l</^>| 2 == Wft < 
i=i 

Furthermore, let J= w := {/ £ L 2 [0,1] : ||/|| 2 , < oo} and (f,g) w ■= Y,jLiWj(f,<Pj)(tPj,9)- 
Note that this weighted inner product induces the weighted norm 

Here and subsequently, given strictly positive sequences of weights 7 := (jj)j^i and 
co := (wj)j>i we shall measure the performance of any estimator (3 by its maximal J-^-risk 
over the ellipsoid T~ t with radius p > 0, that is sup^^p E||/3 — (3^. We do not specify the 
sequences of weights 7 and oj, but impose from now on the following minimal regularity 
conditions. 

Assumption 2.1. Let uj := (oJj)j>i and 7 := ("fj)j^i be positive sequences of weights with 
U\ = 1 and 71 = 1 such that (l/7j)j^i and (^jfjj)j^i are non increasing zero-sequences. 
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Note that under Assumption 2.1 the ellipsoid is a subset of J-£, and hence the J-^-risk 
a well-defined risk for j3. Roughly speaking, if describes p-times differentiable functions, 
then the Assumption 2.1 ensures that the J-^-risk involves maximal s < p derivatives. 

2.2 Minimax optimal estimation. 

The objective of the paper is to construct an estimator which attains the minimal rate of 
convergence of the maximal .T-^-risk over the ellipsoid for wide range of sequences 7 
and to satisfying Assumption 2.1, without using an a-priori knowledge of neither 7 nor p. 
Therefore, let us first recall a lower bound which can be found in Johannes [2009]. Let 
m* := (m*) 6 N for some A ^ 1 be chosen such that 

i.e. (l/n) Ylj=i U j/Xj an d Wm*/7m« have the same orders. 

Given an i.i.d. n-sample of (Y, X) obeying (1.1) with a > and X £ X with associated 
sequence of eigenvalues A, we have then for any estimator (5 that 

rap {e||/3 - } ^ ^ min (^-, max( Wm * / 7m . , l/n) for all n > 1. (2.1) 

On the other hand consider the estimator (3 m defined in (1.4) with dimension parameter 
m = m*. If in addition X G X^, then it is shown in Johannes [2009] that there exists a 
numerical constant C > such that 

sup {E||0 m . -p\\l} ^CA 3 ZlpE\\X\\ 2 + a 2 } max( Wm . / 7rn . , l/n). 

Therefore, the minimax-optimal rate of convergence is of order 0(max(u; m * / 7m * , l/n)). As 
a consequence, the orthogonal series estimator (3 m * attains this optimal rate and hence is 
minimax-optimal. However, the definition of the dimension parameter m* used to construct 
the estimator involves an a-priori knowledge of the sequences 7, to and A. Throughout the 
paper our aim is to construct a data-driven choice of the dimension parameter not requiring 
this a-priori knowledge and automatically attaining the optimal rate of convergence. 

2.3 Example of rates 

We compute in this section the rates that we can obtain in three configurations for the 
sequences 7, u and A. These cases will be referred to in the following. In all three cases, we 
take the sequence uo with to a = j 2s , j ' 1, for s € R. 

Case [P-P] Polynomial-Polynomial. Consider sequences 7 and A with 7, = j 2p , j ^ 1, 
for p > max(0, s), and Xj x j~ 2a , j 1, for a > 1/2 respectively, where the notation 
uj x Vj, j > 1, means that there exists a constant d > such that Uj/d Vj ^ duj for 
all j > 1. Then it is easily seen that {m* n ) 2 ^-P"> = x EjL"i ^7" - EjSi f s+2a 
and hence m* x n V(2p+2a+i) if 2s + 2a + 1 > 0, m* x n 1 /^ -5 )! if 2s + 2a + 1 < 
and m* x (n/ log(n)) 1 /l 2 ^ p ~ s ^ if 2a + 2s + 1 = 0. Finally, the optimal rate attained 
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by the estimator is max(n^ 2p - 2s )/( 2a+2p+1 ), n" 1 ), if 2s + 2a + 1 / (and log(n)/n if 
2s + 2a + 1 = 0). Observe that an increasing value of a leads to a slower optimal rate 
of convergence. Therefore, the parameter a is called degree of ill-posedness (c.f. Natterer 
[1984]). 

Remark 2.1. Obviously the rate is parametric if 2a + 2s + 1 < 0. The case ^ s < p can 
be interpreted as the L 2 -risk of an estimator of the s-th derivative of the slope parameter (3. 
On the other hand the case, s = —a, corresponds to the mean-prediction error (c.f. Cardot 
and Johannes [2009]). □ 

Case [E-P] Exponential-Polynomial. Consider sequences 7 and A with 7^ = exp(j 2p ), 
j ^ 1, for p > 0, and (as previously) \j x j~ 2a , j ^ 1, for a > 1/2 respectively. Then 
ml is such that exp(-(m*) 2 f)(m*) 2s = x x n~ l j 2s+2a - In case 

2a + 2s + 1 > this is equivalent to exp(— (m*) 2p ) x (m*) 2a+1 ra _1 and hence m* x 
(logn- log(logre)) 1 /( 2p ). Thereby, n _1 (log n )( 2 <i+i+2s)/(2p) j g t j ie optimal rate attained 

by the estimator. Furthermore, if 2a+2s+l < 0, then m* x (log(n) + (s/p) log (log (n))) 1 ^ 2 ^ 
and the rate is parametric, while if 2a + 2s + 1 = 0, the rate is of order log(log(n))/n. 

Case [P-E] Polynomial-Exponential. Consider sequences 7 and A with jj = j 2p , j ^ 1, 
for p > max(0, s), and Xj x exp(— j 2a ), j ^ 1, for a > respectively. Then (m*) 2 ( s-p ) = 
Tf x Eft ^ n" 1 E"Li J 2s exp(i 2 «) and hence r< x (log n- log(log n)) 1 /^) 

with (g)vo := max(g , ,0). Thereby, (log n)~^ p ~ s ^ a is the optimal rate attained by the esti- 
mator. The parameter a reflects again the degree of ill-posedness since an increasing value 
of a leads also here to a slower optimal rate of convergence. 

3 A model selection approach: known degree of ill-posedness 

In the previous section, we have recalled an estimation procedure that attains the optimal 
rate of convergence in case the slope parameter belongs to some ellipsoid and its accuracy 
is measured by a J- u -nsk. In this section, we suppose that there exists an a-priori knowledge 
concerning the degree of ill-posedness, that is the asymptotic behavior of the sequence of 
eigenvalues A is known. The objective is the construction of an adaptive estimator which 
depends neither on the sequence of weights 7 nor on the radius p but still attains the optimal 
rate over the ellipsoid J-§ . In this section, we use the following assumption. 

Assumption 3.1. Let A := (Xj)j^\ denote the sequence of eigenvalues associated to the 
regressor X and let lo := (uij)j^i be a sequence satisfying Assumption 2.1 such that 

(i) there exist non decreasing sequences 5 := 5(X,a>) := (S m (X, o>)) m >i and A := A(A,o>) := 
(A m (A, u)) m ^i with 5 m ^ YIj=i Uj/Xj and A rn ^ maxi^ m Uj/Xj for allm^l such 
that for some £ > 0, 

£A m exp(--^)<E. (3.1) 

(ii) the sequence M := {M n ) n ^\ given by M n := arg maxj^/ 5±n(uJM )ai}> n ^ 1> with 
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(q)ai '■= min(g, 1), satisfies 



min A, ^ 2 In for all n ^ 1. (3.2) 

It is worth to note that both sequences 5 and M depend on the eigenvalues A. 

3.1 Definition of the estimator. 

Consider the orthogonal series estimator (3 m defined in (1.4). In what follows we construct 
an adaptive procedure to choose the dimension parameter m based on a model selection 
approach. Therefore, let 3> u = ^j> 1 Aj 1 l{Aj ^ l/n}[u]j(pj for u G L 2 [0, 1] with Fourier 
coefficients [uh := {u, ipj). Then we consider the contrast 

T(t):=\\t\\l-2{t,$~) U} . (3.3) 

Define S m := span{(^i, . . . , (p m }. Obviously for all t G S m it follows that (t, &g)uj = (t, f3 m )w 
and hence Y(i) = \\t — /3 m ||5 — 1 1 /3m 1 1 2? - Therefore, we have for all m ^ 1 

arg min T(t) = (3 m . 

Let X G X* and E|Y/oy| 4 ^ n with <jy := Var(Y). Under Assumption 3.1, we consider 
the penalty function 

pen(m) := \§2o\r\— . 

n 

The adaptive estimator (3^ is obtained from (1.4) by choosing the dimension parameter 

{T(An)+pen(m)}. (3-4) 
Note that we can compute 

A [?]? - 
3=1 A i 

Remark 3.1. Throughout the paper we ignore that also the value <7y and n are unknown 
in practice. Obviously Oy can be estimated straightforwardly by its empirical counterpart. 
An estimator of the value n is not a trivial task. However, if in addition the regressor X and 
the error term e are Gaussian, then Y ~ A/"(0, Oy) and hence r\ = 3 is a-priori known. We 
may take an other point of view if we chose a-priori a sufficiently large n ^ 3 (the Gaussian 
case is included) then the following assertions apply as long as the unknown data generating 
process satisfies the conditions X G X^ and E|Y/cry| 4 ^ r\. □ 

3.2 An upper bound. 

We derive first an upper bound of the adaptive estimator (3^ by assuming an a-priori 
knowledge of appropriate sequences 5 and M which are used in the construction of the 
penalty and the admissible set of values of m. 



m := arg mm 

l^.mS^M n 
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Theorem 3.1. Assume an n-sample of (Y,X) satisfying (1.1). Let K\Y/ay\ ^ rj and 
X G X% be 1-periodic and second order stationary with associated eigenvalues A. 

Suppose that the sequences 7 anduo satisfy Assumption 2.1. Let 5, A and M be sequences 
satisfying Assumption 3.1 for some constant S. Consider the estimator (5fn defined in (1.4) 
with m given by (3.4). If in addition X G X^ and E|y/cry| 24 ^ £, then there exists a 
numerical constant C such that for all re ^ 1 and 1 ^ m ^ M n , we have 

sup {E||^ -P\\l}^c{^p+^ (pE\\X\\ 2 + a 2 )n } 

+ - (pE||X|| 2 + a 2 ) [St +p][l + (E||X|| 2 ) 2 ], 
n 

where K = if(£, 77, £, <5i) is a constant depending on £,77, £ and <5i on/?/. 

It is worth noting, that in the last assertion we do not impose a complete knowledge of 
the sequence of eigenvalues A associated to the regressor X. In the next Corollary we state 
the upper bound when balancing the terms depending on m, which is obviously a trivial 
consequence of Theorem 3.1. 

COROLLARY 3.2. Let the assumptions of Theorem 3.1 be satisfied. If in addition the se- 
quence m° := (rei*) n ^i is chosen such that 7 m o<5 m o / \n L0 m o) xl, n)l, then we have 

sup \ E\\Pfh ~ PWl \ = O(max(o; m o/7 m <>, l/n)J as n -» 00. 

Remark 3.2. Comparing the last assertion with the lower bound given in (2.1), we see that 
the adaptive estimator attains the optimal rate of convergence, as long as 
sup n ^ 1 w m «7 m ./(7 m ow m J < 00. Obviously a sufficient condition is given if the sequence 5 
satisfies in addition sup m ^ x 5m/Ej=i ^j/^j) < 00 • ^ ne polynomial case below provides an 
example. However, this condition is not necessary as can be seen in the exponential case. 
□ 



3.3 Convergence rate of the theoretical adaptive estimator. 

We described in Section 2.3 three different cases where we could choose the model m such 
that the resulting estimator reaches the optimal minimax rate. The following result shows 
that, in case of known degree of ill-posedness, we can propose choices of sequences 5, A and 
M such that the penalized estimator automatically attains the optimal rate. 

Proposition 3.3. In cases [P-P] and [E-P] with 2a+2s+l > 0, let 5 m x m 2a+2s+l , A m x 
m (2a+2s) v0 an d M n x re 1 /( 2a+1+ ( 2s ) v0 Wa (g)vo := max(g,0). While in case [P-E], choose 
5 m x m 2a+i+(2s)vo eX p(m 2a ), A m x m ( 2s ) v0 exp(m 2a ) and M n x (logn/(logn)( 2a+1+ ( 2s ) v °)/( 2a )) 1 /( 2a ). 

Then Assumption 3.1 is fulfilled and, under the additional assumptions of Theorem 3.1, 
the adaptive estimator (3^ reaches the optimal rate. 

In cases [P-P] and [E-P] , if 2a + 2s + 1 < 0, then the sequence 5 can be taken of order 
1. The collection of models must be reduced to {[\Ai] 5 . . . , re} since M n can be taken equal 
to re. It appears then that the rate is parametric in this case. In fact, no model selection is 
necessary in this large m (m = n for instance) can be chosen. 

Now, we have in mind to prepare the case where the degree of ill-posedness of the A, 's, 
and more precisely 5 m and M n , are unknown. We propose hereafter a more intrinsic choice 
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of S m , which does not require anything but the Aj's (which can be estimated). In this spirit, 
we can prove the following assertion. 

PROPOSITION 3.4. In cases [P-P] and [E-P] with a + s ^ or in case [P-E], choose 
A m := m&xx^j^mOJj/Xj, K m := maxi^ m (cjj) v i/Aj with (g) v i := max(g, 1) and 



mA r 



log(K m V (m + 2)) 



log(m + 2) 



(3.5) 



T/ien Assumption 3.1 is fulfilled and, under the additional assumptions of Theorem 3.1, the 
adaptive estimator 0^ reaches the optimal rate. 

4 A model selection approach: unknown degree of ill-posedness 

In this section, the objective is the construction of a fully adaptive estimator which does 
not depend on the sequence 7 and A. Nevertheless the resulting estimator still attains the 
optimal rate in case the slope parameter belongs to some ellipsoid J 7 ^ and the sequence 
of eigenvalues A associated to the covariance operator of X has a given (unknown) rate of 
decrease. 

The configuration given in Proposition 3.4 is now the right reference and the choice that 
the estimator is going to mimic. In particular, it is easily seen that there exists always a 
constant S > such that the sequences 5 and A given in Proposition 3.4 satisfy Assumption 
3.1 (i). Observe that in this situation we have 

a / S m x * / rnlog(K m V (m + 2)) 

A m exp( — ) = A m exp( — ; ) 

PV QAj PV 6 log(m + 2) ' 

m log(K m V (m + 2)) 



< (K m V (m + 2)) exp( 
^ exp(— m 



6 log(m + 2) 
1 log(m + 2)ilog(K m V(m + 2))' 



log(m + 2) 



.6 m 
where the last term is obviously summable. 

Assumption 4.1. Let A denote the sequence of eigenvalues associated to the regressor X , let 
5 and A be the sequences defined in Proposition 3.4 and let 7 and to be sequences satisfying 
Assumption 2.1 such that 

(i) the sequence M := (M n ) n ^i given in Assumption 3.1 satisfies in addition to (3.2) also 

logn A m 

j£ max — — for all n 1; 

2n m>M n m{0J m )\yi 

(ii) the sequence m° := (m^) n ^i given by 1/c ^ 7 m o 5 m o / (n u m o ) ^ c for all n ^ 1 and 
some c ^ 1 satisfies 

min — m — ^ 2(logn)/n /or all n ^ 1; 

l^m^m* m(UJ m )yl 

(Hi) the sequence N := (iV n ) n >i given by N n := arg max{ max Uj/n ^ 1}, n 1, satisfies 
M„ < A n < n /or a// n ^ 1. 
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Remark 4.1. The last assumption is technical but satisfied in the interesting case. Note 
that (i) and (u) together imply ^ M n for all n ^ 1. The condition (m) is rather weak, 
observe that the sequence u is a-priori known and thus also the sequence of upper bounds 
N. In particular, recall that in case u = 1 the .F^-risk corresponds to the L 2 -risk. If u> m ^ 1 
for all m ^ 1, then F^-risk is weaker than the L 2 -risk and N n = re. Only if the ^,-risk 
is stronger than the L 2 -risk, that is uj is monotonically increasing, we choose N n such that 
w at„ x Then it is not hard to see that in these situations (Hi) is satisfied at least for 
sufficiently large re. □ 

4.1 Definition of the estimator 

We follow the model selection approach presented in the last section. Define 
A m := max t+1,\ ^, , , and K m := max — i l r -r ^, , 

We shall refer to S m as defined in (3.5) and consider its estimator given by 
log(7c m V (m + 2)) 



S m ■= mA„ 



log(m + 2) 

If X € A? 4 and E|y/cry| 4 77, then we define a random penalty function 

pen(rei) = 1920<Ty?/ — . 

n 

Moreover, we consider a random upper bound for the collection of models given by 

M n := arg max! ^ M — > (log n)/n|. (4.1) 

The adaptive estimator /%j is obtained from (1.4) by choosing the dimension parameter 

rh := arg min |T(/3 m ) + peh(?re)| (4-2) 

We shall emphasize that the proposed estimator does not depend on an a-priori knowledge 
of neither the sequence 7 nor the sequence A. 

4.2 An upper bound. 

In the next assertion we provide an upper bound of the fully adaptive estimator (3^ by 
assuming that the sequences A, u and 7 satisfy Assumption 4.1. 

Theorem 4.1. Assume an n-sample of(Y,X) satisfying (1.1). Suppose that E\Y/o~y | 4 ^ f] 
and that X £ X* is 1-periodic and second order stationary. Let Assumption 4-1 be satisfied. 
Consider the estimator (3^ defined in (1.4) with rh given by (4.2). If in addition X £ X^ 8 
and E|y/ciy| 28 ^ £, then there exists a numerical constant C > such that for all n ^ 1 



sup {nPfn-PWl} ^C^(p + c V [pE\\X\\ 2 + a 2 }) 

a^nzP \. 7 m o 



1 



+ ^ [pE||X|| 2 + a 2 } [1 + 6! + p][l + (E||X|| 2 ) 2 ], 
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where and c are defined in Assumption J^.l, K = K(E, rj, £, S±) is a constant only 
depending on rj,f;,5i and £ such that the sequences 5 and A given in Proposition 3.4 satisfy 
Assumption 3.1. 

Remark 4.2. Comparing the last assertion with Theorem 3.1, we see that under Assump- 
tion 4.1 the proposed adaptive estimator obtains the same rate as in case of known degree 
of ill-posedness. We only have to impose in addition slightly stronger moment conditions. □ 

It is easily verified that in all the examples discussed above the fully adaptive estimator 
attains the optimal rate, which is summarized in the next assertion. 

Corollary 4.2. In cases [P-P] and [E-P] with a + s ^ or in case [P-E], Assumption 
4-1 is fulfilled and, under the additional assumptions of Theorem 4-1, the fully adaptive 
estimator (3 m with fh given by (4.2) reaches the optimal rate. 



Conclusion. Assuming a circular functional linear model we derive in this paper a fully 
adaptive estimator of the slope function (3 or its derivatives, which attains the minimax 
optimal rate of convergence. It is worth to note, that in this paper not only the penalty is 
chosen randomly but also the collection of models. In this way the proposed estimator is 
adaptive also with respect to the degree of ill-posedness of the underlying inverse problem. 
We can thereby face both, the mildly and the severely ill-posed case. 

It is not clear that the ideas in this paper can be straightforwardly adapted to treat the 
case of noncircular functional models. We are currently exploring this issue. 

A Appendix 

A.l Proof of Theorem 3.1 

We begin by defining and recalling notations to be used in the proof. Given u G L 2 [0, 1] we 
denote by [u] the infinite vector of Fourier coefficients [it],- := (u,(fj). In particular we use 
the notations 





= (Xi, 






4 


= Var(Y), 


Pm — 


m 




> 1 / n }[9]j<Pj, 


Pm 


m 

■-^^[dlm, Pm 


*« = 




4ft 


> i/n}[u]j(pj, 




m 



Given m^lwe have then for all t E S m = span{<^i, . . . , ip m } 



m m 



(t,fl u = 1> [*];[/% = J2^¥ Ek = 



A.- 

j= i j= i J 



n n m 

1 * V _ , , . 1 V— V „ „ * V Un 



n ' n ^-^ ^-^ A ; 

i=l i=l j=l J 

n n m 

1 v „ , O , 1 x—v „ , v Wi 



i=l i=l j=l A? 
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Furthermore, define the event 

Vy,x ■= {\Y/a Y \ ^ n l '\ \[X\j/^\ ^ n 1 / 6 , Kj^ M n } 

and denote its complement by fiyj. Then consider the functions h and / with Fourier 
coefficients given by 



1 n 

To . 



n 

i=l 

Obviously we have [g]j — [g]j = [h]j + [f]j and hence for all t 6 S m 

(t, $g - P) u = (t, $ ? - <J> 9 ) W = (t, $ ? - 4> 9 ) w + (i, $ ? - %) w 

= <«, *}>>a, + <*, + ^t^. (A.2) 

We shall prove in the end of this section three technical Lemmas (A.2 - A. 4) which are used 
in the following steps of the proof. 

Consider now the contrast T then by using (3.3) and (3.4) it follows that 
T(3a) + pen(m) ^ T0 m ) + pen(m) sC T((3 m ) + pen(m), VI < m < M n , 
which in particular implies by using the notations given in (A.l) that 

\\Pm\\l - \\Pm\\t < 2{ - (/3 m ,$g)a;} + pen(m) - pen(m) 

= 20 m - m , + pen(m) - pen(m). 

Rewriting the last estimate by using (A.2) we conclude that 

WPfh - P\\l = \\P " Pm\\l + AWl - IIAX " 2{Pm - Pm,0) U 

< \\P - Pm\\l + pen(m) - pen(m) + 20^ - p m , $ ? - p) u 

< \\P - Pm \\l + pen(m) - pen(?n) 

+ 2(% i - P m , $j>) w + 2(% i - P m , $f)u> + 20fn - P m , $ ? - (A.3) 

Consider the unit ball B m := {/ £ S m : ^ 1} and let mVm := max(m, m). Combining 
for t > and / 6 5 m the elementary inequality 

2|</,0> w | < 2||/|U sup \(t,g) u \ ^ r\\f\\l + - sup \(t,gU 2 
teB m T teB m 

with (A.3) and Pm — Pm £ S myrn C Sm„ we obtain 

l& - PWl < II/? " + 6r IIAn - Anllw + Pen(m) - pen(m) 

2 ~ 2 ~ 2 ~ ~ 

+ - sup |(t,$r) a ,| 2 + - sup \(t,$ r ) w \ 2 + - sup |(t,$ ? - $g) w | 2 . 
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Then, noting that pen(m V m') ^ pen(m) + pen(m') and — /3 m || 2 ^ 2||/?^ — /3|| 2 + 
2||/? m — we get, together for r = 1/16 and pen(m) = V^2o\r\b m jn that 

(l/4)||Aa - /3|| 2 < (7/4)||/3 - /3 m ,|| 2 + 32 ( sup |<i, \)^\ 2 - (1/32) pen(m V m) 

+ 32 sup $j)w| 2 + 32 sup | (i, $^ - %}u,| 2 + pen(m V m) + pen(m) - pen(m) 

< (7/4)||/3-/3 m || 2 + 32 £ ( sup \(t, ^) u \ 2 - 6a^ V S m r/n 



m'=l * eB ™' 

+ 32 sup |(t,i> f )J 2 + 32 sup |(t,8 ? -$g)J 2 + 2pen(m). (A.4) 

Combining the last bound with (A. 5) in Lemma A. 2, (A. 9) and (A. 10) in Lemma A. 3 we 
conclude that there exist a numerical constant C and a constant K(Ti,r}) depending on E 
and rj only, such that for all n ^ 1 and for all 1 ^ m ^ M n we have 

E||^-/3|| 2 0||/?-/3J| 2 +8pen(m) + -[C£^ 

n 

Since (w/7) is monotonically non increasing we obtain in case (3 G J-S f that ||/3|| 2 < p and 
11/3 — /3 m || 2 ^ (umhm)P- Moreover, by using that X and e are uncorrelated it follows 
a\ = Var((A, /?)) + a 2 Var(e) ^ E(X, j3) 2 + a 2 ||/3|| 2 E||X|| 2 + a 2 . Hence, a\ ^ pE||A"|| 2 + 
a 2 because 7 is monotonically non decreasing. The result follows now by combining the 
last estimates with the definition of the penalty, that is, pen(m) = 1920y?7<5 m /n, which 
completes the proof of Theorem 3.1. □ 

Technical assertions. 

The following lemmas gather technical results used in the proof of Theorem 3.1. We begin 
by recalling an inequality due to Talagrand [1996] , which can be found e.g. in Comte et al. 
[2006]. 

Lemma A.l (Talagrand's Inequality). Let T\, . . . , T n be independent T -valued random vari- 
ables and u^(r) = (l/^)^T=i [ r (^i) — ^[ r (^i)]]; $ or r belonging to a countable class 1Z of 
measurable functions. Then, for e > 0, 

E[supK(r)| 2 -2(l + 2e)tf 2 ] + 

ren 

with K\ = 1/6, K2 = l/(21\/2), C(e) = y/1 + e — 1 and C a universal constant and where 



sup sup \r{t)\ ^ h, E 



supK(r) 

r£TZ 



1 " 

< H, sup - V Var(r(Ti)) < 



V. 



Lemma A. 2. Let A be the eigenvalues associated to X £ and E|y/<jy| 4 ^ r\. Suppose 
sequences 5, A and M satisfying Assumption 3.1. Then there exists a constant K(Y,,r},5i) 
only depending onT,,rj and 5\ such that 

y Efsup |(t,$r)^| 2 -64?7— ) < K(E,rj,Si) — for all I. (A.5) 
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PROOF. Given m £ N and t £ B m := {f £ S m : \\f\\ w < 1} denote 

m r ,1 

ut(y; x) : = ri^ (t, <d x > w = ]T ^n, !) N 

then it is easily seen that (t, = (1/n) l^Lil^O^, Aj) — (Y^, ^Q)}. Below we show 
the following three bounds 



sup sup \vt(y, x)\ ^ oyn 1 / 3 ^/ 2 =: h, 
E sup \(t,^U 2 ^a^r,^=:H 2 



teB„ 



n 



1 n 

sup - VVar^^JSQ)) ^ tr^r/A, 



(A.6) 
(A.7) 
(A.8) 



From Talagrand's inequality (Lemma A.l) with e = 1 we obtain by combining (A.6)-(A.£ 



E 



sup |(t,^y 2 -6# 2 



^ C \ — exp 



7ltf 2 \ /l 2 



6v ) n 



H — 5- exp 



C 



■ CXp 



6A r 



n 



cnH 
h 



+ ay ^ — exp [—crjn 



with c = (1 — l/v2)/21 and some numerical constant C > 0. By using Assumption 3.1, 
that is 5 m /Ti ^M n /^ ^ &i and Ivl n jfi ^ 1, together with H 2 — CYTjS m /ti it follows that 



V E[sup |(i,$r)J 2 - %a^rj5 m /n 



m = l 



2 M„ 



< C {^T E A ™ ex P("^) +4^ 2/3 exp(-cr ? nV6)} 



m=l 



7? 



^ C^-Mr/E + tfi exp(-c?7n 1/6 + (5/3) log n) !-, 



where condition (3.1) in Assumption 3.1 implies the last inequality. It follows that there 
exists a constant J^(S,r/, 5i) only depending on S, 77 and 5i such that 

Mn r id 2 

VE sup|(t,$r) w | 2 -64^ m /n < — ^(S, for all 71^1, 

m=l ie ° m 

which proves the result. 

Proof of (A.6). From sup tgBm |(t,g)oj| 2 = Sfli^ib]? an d the definition of £ly,X follows 



sup 



sup 



1 -A j <r 



E 



A, 



yGR,xe-L 2 [0,l],teB m J/GM,xGL 2 [0,l] - =1 A j 

and, hence the definition of 5 m implies (A.6). 
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Proof of (A. 7). Since (Yj, Xj), i = 1, . . . ,n, form an n-sample of (Y, X) we have 

e sup \(t, $~ h u 2 = E^ Var ^E 1 ^ 1 ^ w < - E ii E K,xi4-) 2 

teB ™ i=i A i V ;=i / j=i A J 

and hence from E|Y/cry| 4 rj and X G A' 4 it follows that 

2 m i /2 2 m 

Esu P |(t,^i^^^^(E|y/a y | 4 iE|[xyA-i 4 ) ^%Er- 

teB N n JT[ A j y ' n JT[ A j 

Thereby, the definition of 5 m implies also (A. 7). 

Proof of (A.8). Consider z := ( Zj ) with Zj := (u j [t] j / y O^)/(J2]L 1 (uj][t} 2 j /\ j )) 1 / 2 and, 
hence z G S m = {z G M m , X/j=i z j = !}■ Since (Yj,Xj), i = 1, . . . , n, form an n-sample of 
(Y, X) it follows that 

1 n m lt\ \ 2 

sup -VVar^C^JQ)) < sup E(yin rx V^^fX],- ' 

Thereby, from E\Y/a Y \ 4 < 7? and X G Af 4 we conclude that 



sup -^Var^^X;)) < sup a Y (E\Y/a Y | 4 ) 1/2 (eI^ 



4 1/2 

< afy /2 su P E^MAi) sup (e ) 

m 

< CTyT? sup V(wJ[tg/Aj) < cr^ max w,-/Aj. 

Thus the definition of A m implies now (A.8), which completes the proof of Lemma A. 2. □ 

Lemma A. 3. Let A be the eigenvalues associated to X £ X 2A and let E|Y/cry| 24 ^ £. 

Suppose sequences 5, A and M satisfying Assumption 3.1. Then there exists a numerical 
constant C such that 

E sup |(£,<1>^| 2 s$ V2£a Y 5 1 /n and (A.9) 

E sup |(£,$ ? -$ ? )J 2 ^C^{a Y 6 1 + \\(3\\ 2 J }{l + (E\\X\\ 2 ) 2 } foralln^l. (A.10) 
Proof. Since (Yl, Xi), i = 1, . . . , re, form an n-sample of (Y, X) it follows that 

_ M n / n \ Mr, 

E sup |<*,* f ) w | a = Ej[ff Vtt ta£ y ^ 
Thereby, from E\Y/a Y \ 24 < £ and X G we conclude that 

E sup |<i,5> 7 U 2 < ^^^(E\Y/a Y \^\[X] ] /^) l/i p(^ x ) l l 2 
t&B Mn 1 n ~[ Aj v y 



n ^-^ \n n 



2 t\ 2 1V1 n c 
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where the last inequality follows from the property 5 m ^ ^7=1 f° r an 171 ^ H ence by 
using Assumption 3.1, that is SM n /n ^ <5i> we obtain 

E sup |(£,^U 2 <of<Si£ 1/2 P(fi£ x ) 1/2 . 

t&B Mn 

The estimate (A. 9) follows now from P(Q YX ) ^ 2£/n 2 , which can be realized as follows. 
Since Q YX = {\Y/a Y \ > n 1/6 } U Uj=i{l PHj/^'l > n 1 / 6 } it follows by using Markov's 
inequality together with E\Y/a Y \ 24 ^ and I e that 

E|y/ 7 i' 8 + gE| W3 /^|^^ 

3=1 

Thus, under Assumption 3.1, that is, M n /n ^ 1, we obtain P(Q YX ) ^ 2£/ra 2 , which 
completes the proof of (A. 9). 

Proof of (A. 10). Consider the decomposition 

M n , 2 

+ 2 J>,[/3] 2 (^-l) lft^l/n} 
i=i A J 

M n / n r „ ■, \ 2 

j=l ^ \ «=1 J / 

M n 

+ 2^u; j [/3] 2 l{A j < 1/n} (A.ll) 
i=i 

where we bound each summand separately. First, from (A. 16) and (A. 19) in Lemma A. 4 
together with X G X^ 4 and E|y/cry| 24 ^ £ it follows that there exists a numeric constant 
C > such that 

M in ^ 2 



E^(f-') ift>w(iE^-« 



< E ^ [ E i^ Ai -n 4 i{^ > im] 1/2 [e - v^[/% 



- 1/2 






E ( 



•1 



1/2 



2 e Mn 



3 =x j 
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M n \ 2 p Mn 

*Y,»M[t - l ) !ft > !M < ^E^[^K A ' + !}■ ( A - 13 ) 

7=1 ^ 7=1 

Furthermore, Assumption 3.1 (ii), i.e., 2/n ^ min{Aj : 1 ^ j ^ M n }, implies -P(A,- < 1/n) ^ 
P(\j/\j < 1/2). Thereby, from (A.16) and (A.18) in Lemma A.4 together with X G Af| 4 
and E|y/o"y| 24 £ it follows that there exists a numeric constant C > such that 



l 



j'=l ^ \ i=l 7 / 



7= 



7=1 ^ \ i=l 7 / 



2 t M ™ 



M n M„ ,. M„ 



7=1 "■ 3 



EE^-^lft < 1/n} < E w i W P ftAi < V2) < C^E w i ( A - 15 ) 

7=1 7=1 7=1 

Combining the decomposition (A. 11) and the bounds (A. 12) - (A. 15) we obtain 

{ 



E sup |(t,$ ? -$ ? ) w | 2 ^ci{^^4{A J 2 + 2} + ^^[/3]|{A J 2 + 2}}. 

.7=1 3 .7=1 



Therefore the properties E||A|| 2 ^ maXj^i Aj and 5 m ^ X^^=i IT f° r all m ^ 1 imply 

E sup |( t ,$ ? -$ ? ) w | 2 ^^{45A/„/n+||/3|| 2 }{(E||A|| 2 ) 2 + 2}. 
teS Mn n 

Thus (A. 10) follows now from bu n l n ^ $i (Assumption 3.1), which completes the proof. □ 

Lemma A.4. Suppose X e X^ k and E\Y/a Y \ 4h ^ mk, k ^ 1. Then for some numeric 
constant Ck > only depending on k we have 

2k 

2k 



<C k af m n-\ (A.16) 

E|VA,-l| 2fe <C fc 7?4fen- fe . (A.17) 
If in addition w\ ^ 2 and ?x>2 ^ 1/2, i/ien we obtain 

supP(Aj/Aj ^ wi) ^ Ckr]Ak n ~ k an d sup P(\j / \j < W2) ^ CkiiAk n ~ k - (A.18) 

Moreover, if X G ^j^t' k ^ 1, then for some numeric constant Ck > on'?/ depending on 
k we have 

E|Aj/Aj - l| 2fc l{A, ^ 1/n} < C fc n 12fc {Af + l}n" fc . (A.19) 
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Proof. Since EY"[X]j = Aj[/3]j the independence within the sample of (Y,X) implies by 
using Theorem 2.10 in Petrov [1995] for some generic constant Ck that 

^ C k afn~ k (ElY/o-^mxy/y/Xjl 4 ^ 2 . 

Then the last estimate together with X G <Y 4 4 fe fc an d E|y/cry| 4fc ^ 774^ implies (A. 16). Fur- 
thermore, since {(|pQ]j| 2 /Aj — l)j} are independent and identically distributed with mean 
zero, it follows by applying again Theorem 2.10 in Petrov [1995] that E\Xj/Xj - l\ 2k < 
Cfcn^ElKxyVAjf - l| 2fc . Thus, the condition X G X% k k implies (A.17). 

Proof of (A.18). If w ^ 2 then P{Xj/Xj > w) < POVAj - 1| > 1). Thus applying 
Markov's inequality together with (A.17) implies the first bound in (A.18), while the second 
follows in analogy. 

Proof of (A. 19). By using twice the elementary inequality — l\ 2k + \Xj/Xj\ 2k ^ 

l/2 2fc_1 we conclude that 

A 2fe 

E\Xj/Xj - l\ 2k t{Xj ^ l/n} ^ 2 2fc ~ 1 {E|A i /A j - l] 4 ^!^ ^ 1/n} + E\X j /X j - l\ 2k } 

A i 

< 2 ik - 2 X 2k n 2k E\X j /X j - l\ 6k + 2 4fc ~ 2 E|A j /A i - l| 4fc + 2 2fc ~ 1 E|A i /A j - l| 2fc }. 
Thus, (A. 19) follows from (A.17) since X G ^j^, which proves the lemma. □ 

A. 2 Proof of Proposition 3.3 

Case [P-P] Since 2a + 2s + 1 > it follows that the sequences 5, A and M with 5 m x 

m 2a+2s+1 , A m x m (2a+2s) v0 anc i M n x n i/(2a+i+(2s)vo) ; respectively, satisfy Assumption 
3.1. Note that bM n l n ^ 1; M n /n ^ 1, mini^'^A/ n Xj ^ 2/n and VC > 0, 

]T A m eM-C5 m /A rn ) ^Y. m[2a+2S)V ° ^- Cm(2a+2S+1)A1 ) < +°°- 

m m 

Therefore we can apply Theorem 3.1 and hence Corollary 3.2. In particular, by using x 
n i/(2a+2p+i\ which satisfies "fm°5 m v/(nuj m °) X 1, it follows that the adaptive estimator 
reaches the optimal rate cj m ,«/7m» ~ n ~ 2 (p~ s )/( 2 P+ 2a + 1 ) . 

Case [E-P] The sequences 5, A, Af are unchanged w.r.t. the previous case [P-P] and 
hence Assumption 3.1 is still satisfied. From Corollary 3.2 follows now again that the 
adaptive estimator (5^ attains the optimal rate uj m <> /j m <> x n~ 1 (log n)( 2a+1+2s )/( 2p ) since 
ml x {log[n(logn)-( 2a+1 )/( 2 f)]} 1 /(2p) sat i s fies 7r<W(™i<) x 1. 

Case [P-E] Consider the sequences 5, A and M with 8 m = m 2a+i+(2s) v o e xp(m 2a ), A m = 
m (2s) vo eX p(m 2a ) and M n = (log n/(log n )2<n-i+(2s)vo)/(2a))i/(2a) respectively. Then Assump- 
tion 3.1 is satisfied, that is 8M n / n ^ 1> M n /n ^ 1, mm i^j^A/ n Xj ^ 2/n and VC > 0, 

^A m exp(-C<5 m /A m ) < ^m( 2s ) vo exp(m 2a )exp(-Cm 2a+1 ) < +00. 

m m 
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Moreover, ~/ m +5 m o / (nu m <>) x 1 implies to* x (logn/(logn)( 2a+2p+1 )/( 2a )) 1 /( 2a ). Finally, 
due to Corollary 3.2 the adaptive estimator (3fn attains again the optimal rate w m »/7 m « x 
(logn) _ ( p_s )/ a , which completes the proof of Proposition 3.3. □ 



log(K m V(m+2)) 
log(m+2) 



MS 



A. 3 Proof of Proposition 3.4 

Let A m := maxi^ m u)j/\j, K m ■= maxi^ m (cjj) v i/Aj and S m := mA r 
defined in (3.5). Note that | log(K m V(m+2))/log(m + 2)| ^ 1 and hence 5 m ^ Yl 1 JLi u j/^-j- 

Case [P-P] and [E-P]. Since a + s ^ it is easily verified that A m x m 2a+2s , K m x 
m 2a+(2s) v0 with |log(K m V (to + 2))/ log (to + 2)| x (2a + (2s) v o) > 1 and hence, 5 m x 
m i+2a+2s Therefore, the result follows from Proposition 3.3 case [P-P] and [E-P] since 
both sequences 5 and A are unchanged. 

Case [P-E] We have A m x TO 2s exp(TO 2a ), K m x m ( 2s )vo e xp(m 2a ) with, for all m suf- 
ficiently large, log(K m V (to + 2))/log(TO + 2)| x m 2a ^ 1+ ^ 2s ^gf^^) m — - and hence 5 m x 
m i+2a+2s eX p( m 2a ) (i+(2a)vo^(^>g^m,)m )^ rp^^ straightforward calculus shows that Assump- 
tion 3.1 (i) is fulfilled. Moreover, consider the sequence M given in Assumption 3.1 (ii), 
where M n x (log ^Jgg^S^ ) 1 /^) = (i og n )V(2a) ^ + o(1) ) j then also Assumption 

3.1 (ii) is satisfied (as in the proof of case [P-E] in Proposition 3.3). Due to Corollary 

3.2 it remains to balance n x 7 m o 5 m o / 'uj m o x (TO <> ) 1+2a+2p exp((TO°) 2a )/(logTO°) which 

implies ml x (log g^fe ) 1 ^ = (logn)V( 2 «) (l + o( l)) . Hence, * 

(log n)( p ~ s ^ a is the rate attained by the adaptive estimator which is optimal and com- 
pletes the proof of Proposition 3.4. □ 

A. 4 Proof of Theorem 4.1 

We begin by defining additional notations to be used in the proof. Consider sequences 8, 
A, M and m° satisfying Assumption 4.1 and the random upper bound M defined in (4.1). 
Denote by f2 := $7/ n tin the event given by 



1 1 



< — — and Xj ^ 1/n 
2Aj 



Ml := jVj € {1,...,MJ, 

fijj := {ml ^ M n ^ M n }. 
It is easily seen that on f2/ we have for all 1 ^ m ^ M n 

(l/2)A m < A m ^ (3/2)A m and (l/2) Km ^ £ m ^ (3/2)« m 
and hence (1/2) [n m V (to + 2)] < [rc m V (to + 2)] ^ (3/2) [n m V (to + 2)] which implies 

/, /9 n * / log^m V(to + 2)] \ ^ log 2 log(?n + 2) \ ? 

1 7 j m V log(TO + 2) A 1 logOr^log^Vlm + ^v 1 

^(3/2)TOA m ( 1 ° g( i ^ V[m t 2]) )fl+ l0S(m + 2) 



log(m + 2) / V log(TO + 2) log(K m V [to + 2]) . 
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together with log(K m V [m + 2])/log(m + 2) ^ 1 we get 

S m /10 < (log3/2)/(21og3)5 m < (l/2)<5 m [l - (log2)/log(m + 2)] < S m 

< (3/2)<5 m [l + (log 3/2)/ log (m + 2)] < 35 m . 

Since pen(m) = 192<7y77<5 m n _1 and peh(m) = 1920<Ty7/5 m n -1 it follows on f2/ that pen(m) ^ 
peh(m) ^ 30pen(m) for all 1 ^ m ^ M n , and hence 



pen(m^V?n)+peh(m^) — peh(m)J 1^ ^ ^pen(m^)+pen(m)+peh(m*) — peh(m)J 1q 

^ 31 pen(m^) 

by using 1 ^ m ^ M n and M n sC M n . On the other hand, it is not hard to see that 

on f2j we have A m n maxi^j^ m cjj and K m $C n for all m ^ 1. From these properties we 
conclude that for all 1 ^ m ^ M n 

•~ log(n V (m + 2)) 

<5 m ^ mn( max u;,-) — - — ^ r — ^ mn{ max u)j) log(re + 2), (A. 20) 

l<j<m log(m + 2) l<j<m 

which implies peh(m^) ^ 1920ciyr/M n (maxi^^M„ Wj)log(n + 2) and hence 



pen(m* V m) + peh(m^) - peh(m) j l^jnn 



< pen(M n ) + 1920a^r/M n ( max ujj)log{n + 2) I IfKnsi// 

< 1920<7^(^ n /n + M n ( max Wj-)log(n + 2)) l n? nn w (A.21) 

We shall prove in the end of this section the technical Lemma A. 5 which is used in the 
following steps of the proof together with the technical Lemmas A. 2 - A. 4 above. 



Consider now the decomposition 

MP™ - vwi = n\Pm - fiwiin + ndfh - pwiincnnu + ndfh - p\\iiw ir (A.22) 

Below we show that there exist a numerical constant C > and a constant K' = K'(T,, 77, £, Si) 
only depending on E, r/, £ and 5\ such that for all n J? 1 we have 

E||An - dWltn < C'{ - f3 m » \\l + ^ ^ry + ^ 4 ft + ||)C] [1 + (E||X|| 2 ) 2 ]} 

(A.23) 

E||Aa - ln ?n n„ < C'{ \\(3 - /3 m <> || 2 + ^ 4 ft + ll/C ][1 + QE|| A|| 2 ) 2 ]} (A.24) 
E|& - (3\\ltn h < C'i [4 + + E||X|| 2 ]. (A.25) 

Since (cj/7) is monotonically non increasing we obtain in case /3e^ that ||/?|| 2 < p and 
11/9 — /3m« || 2 ^ {^■m% i /lm")p- Moreover, we have Oy ^ pE||X|| 2 + a 2 . From these properties 
by combining the decomposition (A.22) and the estimates (A.23) - (A.25) we conclude that 
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there exists a numerical constant C > and a constant K = K(E, rj, £, <5i) only depending 
on E, 77, £ and <5i such that for all n ^ 1 



^ [pEUXf+cr 2 ]^- [pE||X|| 2 +a 2 ] [l+* 1+ p][l+(E||X|| 2 ) 2 ]}. 



The result follows now from the definition of m°, that is, Jm°8 m ^/(noj m o) ^ c. 

Proof of (A. 23). Observe that on $7 we have to* ^ M n ^ M n . Thus, following line by 
line the proof of (A. 4) it is easily seen that 

(1/4)||^ -/3|| 2 l n ^ (7/4)||/3 -/3 m o || 2 +32 V(sup |<t,^U 2 - Ga^^/n 

+32 sup |(t,$j) w | 2 + 32 sup |(t,$ ? -$ ? ) w | 2 

+ ^pen(m* V to) + peh(m° ) — pen(m)^ 1^ 

< (7/4)||/3 -/3 m . || 2 +32 V( sup |<t,ij>) w | 2 - &r^ ro /n 
+32 sup |(i,$j) w | 2 + 32 sup |(i,$ ? -$ ? >J 2 
+4pen(m*), 

where the last inequality follows from (A. 20). Combining the last bound with (A. 5) in 
Lemma A. 2, (A. 9) and (A. 10) in Lemma A. 3 we conclude that there exists a numerical 
constant C' > and a constant K' = i^'(S, 77, £, 5i) depending on S, 77, £, <5i only such that 
(A. 23) for all n ^ 1 holds true. 

Proof of (A. 24). Note that on Qj n fJ/j we have still < M n ^ M n . Thus, by using 
(A.21) rather than (A. 20) it follows in analogy to (A. 22) that 



(1/4)||^ - /3|| 2 l^nn„ < (7/4)||/3 - /3 m <> || 2 + 32 V ( sup \{t, ^U 2 - 6o$T}6 m /n 



in=l 



+32 sup |(t,$r)J 2 +32 sup |(t,%- $g)a;| 2 +(pen(m*Vm)+p^fi(m^)-pen(m))l Q c 
< (7/4)||/3 - /3 m o || 2 +32 £7 sup \{t, £ K ) W | 2 - 6a^ V 8 m /n 



m=l 



teBr, 



+ 32 sup |(i,$ r )J 2 + 32 sup |(£,<%-$ ? )J 2 
+ 1920a 2 ,7 ? f5Mjra + M n ( max w i )log(n + 2)) l^nn 7J - 

From the last bound together with (A. 5) in Lemma A. 2, (A. 9) and (A. 10) in Lemma A. 3 
we conclude that there exist a numerical constant C > and a constant K = K(Y,, rj, £, 8\) 
depending on S, 77, £ and <5i only such that for all n ^ 1 we have 



ndfn - /0|£lnjnn„ < c{||/3 - /3 m o || 2 + - 4 + ||/3|| 2 ][1 + (E||A|| 2 ) 2 ] 

+ a^r](n- l 5 Mn +n- 2 M n { max Wj))ra 2 log(n + 2)P{nj n O//)}. (A.26) 
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Since X G and ttj n J7 C {3j G {1, . . . ,M n } : |A 3 -/Aj - 1| > 1/2 or Aj < 1/n} it 
follows from (A. 29) in Lemma A. 29 that -P(^/ D O-u) ^ C£M n n~ 6 for some numerical 
constant C > 0. Moreover, due to Assumption 4.1 we have d~M n /n ^ <u> M n /n ^ 1 and 
maxi^j^j\,/ n ^ maxi^j^Ar n loj ^ n. Combining the last estimates and (A. 26) implies now 
(A.24) . 

Proof of (A. 25). Let f3 m := ^jLi\P]j^-{^-j ^ ^-/ n } i Pj- Then it is not hard to see that 
II An " Pm\\l < " /3m' IIS for ^aU m < m' and ||/3 m - /?|| 2 < ||/3|| 2 . By using these 

properties together with 1 ^ in ^ M n ^ N n we conclude 

EIIAa-^llSlnf, < 2{E\\^-^\\lt nh +E0 fh -p\\lt nh } 

^ 2{nh n -h n \\lM I + pii^(^i)}- 

Since X G ;f 28 and Q, c n = {M n < m£} U {M n > M n } it follows from (A.30) and (A. 31) in 
Lemma A.5 that P(tt c n ) < C£n~ 6 for some numerical constant C > and hence 

EI& - (3\\ltn h < 2{E||£ JVb - ||» Inj, + C£ ||/3||*n- 6 }. (A.27) 

Moreover, from (A. 16) and (A. 17) in Lemma A. 4 together with X G and E|y/cry| 28 < £ 
it follows that there exists a numerical constant C > such that 



n\0N n -PnXIos, < 2n 2 ^^{E([^ - A.-f^O 2 !^ +E(A,[/3] J - A.I^O 2 !^} 
< 2n 2 { i max^ J |A J [E - ]"W 



+ max A, ^^[^[E^/A, - 1) 4 ] 1/2 P(^,) 1/2 } 
J ^ i=i 

Cen 2 {n" 4 4 x max Wj - £ A, + n~ 4 max A, ||/?|| 2 }• (A.28) 



By combination of (A.27), (A.28) and E||X|| = Ylj>i ^ max j>i Aj we obtain 



E|| As - ' 



ta h ^C'{n~ 2 cj 2 Y t max ^E||X|| 2 + £{1 + E||X|| 2 }||/3|| 2 n~ 2 }, 



for some numerical constant C" > 0. The estimate (A. 25) follows now from m&xi^j^N n ujj ;C 
n (Assumption 4.1), which completes the proof of Theorem 4.1. □ 

Technical assertions. 

The following lemma gathers technical results used in the proof of Theorem 4.1. 

Lemma A. 5. Suppose X G X^ k , k ^ 1, with associated sequence A of eigenvalues. Let 
M and m° be sequences satisfying Assumption 4-1- Then there exist a numerical constant 
C k > only depending on k such that for all n ^ 1 we have 

P({3j G {1,... ,M n } : iXj/Xj - 1| > 1/2 or Xj < 1/n}) < C kmk M n n~ k , (A.29) 
P(M n < ml) < C k r] 4k n~ k and (A.30) 
P{M n > M n ) < C k n 4k n~ k+1 for all n^l. (A.31) 
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Proof. Proof of (A. 29) . We start our proof with the observation that the event {| Xj/Xj— 1 1 > 
1/2} can equivalently be written as {1 — Xj/Xj > 1/3 or Xj/Xj — 1 > 1}, and hence is a 
subset of {\Xj/Xj — 1| > 1/3}. Moreover, since Xj ^ 2/n for all 1 j ^ M n it follows that 
{Xj < 1/n} C {\Xj/Xj — 1| > 1/2}. Combining both estimates we conclude 

P({3j G {1, . . . , M n } : \Xj/Xj - 1| > 1/2 or Xj < 1/n}) 

M„ M„ 

< E{ p (l^/A, - 1| > 1/3) + P(\Xj/Xj - 1| > 1/2)} < 2^P(|A J /A i - 1| > 1/3). 

3=1 3=1 

Thus applying Markov's inequality together with (A. 17) in Lemma A. 4 implies (A. 29). 

Proof of (A. 30). Due to the definition of M n given in (4.1) the event {M n < m*} is 
a subset of {Vm G {m°,...,n} : A m /(w m ) v i < m(logn)/n} and hence P(M n < m^) ^ 
-P(A m »/A m o < 1/2) since mini^ m ^ m « A m /[m(u; m )vi] > 2(log n)/n (Assumption 4.1 (hi)). 
Thereby, (A. 30) follows from the second bound in (A. 18) in Lemma A. 17. 

Proof of (A. 31). Due to the definition (4.1) of M n for m > M n the event {M n = m} is a 
subset of {A m /(u m )vi ^ m(logn)/n} and hence P{M n > M n ) ^ Y^j=M„+i p (^m/^m > 2) 
since 2max m> M n A m /[m(w m ) v i] ^ (logn)/n (Assumption 4.1 (ii)). Thereby, the first bound 
in (A. 18) in Lemma A. 17 together with N n /n ^ 1 (Assumption 4.1 (iv)) implies (A. 31), 
which completes the proof of Lemma A. 5. □ 



A. 5 Proof of Corollary 4.2 

First, note that in all three cases, the sequences 5, A, M and m° have been calculated in 
the proof of Proposition 3.4. If in addition Assumption 4.1 holds true, then from Theorem 
4.1 follows that the fully adaptive estimator attains the rate u) m °/^ m o, which in the proof 
of Proposition 3.4 has been confirmed to be optimal in all three cases. Therefore it only 
remains to check (i)-(iii) of Assumption 4.1. 

Case [P-P] In this case, we have M n x n i/(2a+i+(2 S ) v0 ) an d m ^ - n i/(2a+2p+i). Then ^ 
of Assumption 4.1 holds true, since mini^^M n Xj x M~ 2a x n -2a/(2a+i+(2s) v0 ) ^ 2 /n and 

Am 



max 



M -l-2a-(2s) v0 _ n -(2a+l+(2 S )vo)/(l+2a+(2s)vo) ^ (l og n)/(2n). 



m^M„ m(uj m )yi 

Moreover (ii) of Assumption 4.1 is satisfied by using that for all p > s 

min Xm ~ (m o ) -l-2a-(2 S ) v0 x n -(2a + l + (2 S ) v0 )/(2 P+ l-2, + (2a + 2 S ) v() ) > 2 » w 

l<m<r< m(uj m )\ji 

Finally, consider (in) of Assumption 4.1. It is easily verified that N n x n 1 ^ 14 ^ 2 ^ ) which 
satisfies maxi <m<JVn w m < N^ x n (^ho/(i+(2sho) ^ n anc } Mn x n i/(2a+i+(2s) v o) ^ 
N n ^ n. Thereby also (Hi) of Assumption 4.1 holds true. 

Case [E-P]. We have M n x ra V(2a+i+(2 S )vo) ) m £ ~ {l og [n(log „)-(2a+i)/(2 P )]|i/(2 P ) and 
iV n x n 1 ^ 1+ ^ 2s ' )v0 \ Then as in case [P-P] (i) and (Hi) of Assumption 4.1 hold true since 
M n and N n are unchanged. Furthermore, for all s G M we have 

min A ™ x «)- 1 - 2a -( 2 -)vo ~ (l g n )-(2a+l+(2 S ) V0 )/(2p) ^ w 

l^m^m* m(a; m ) v i 
which shows (ii) of Assumption 4.1. 
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Case [P-E]. Here we have M n x (log ^^^Ig^ f/^) = (logn)V(2.) (1 + o(1))) 
< * Oog(I^^fe) 1/(2a) = (logn)V(2a) ( i + o( i)) and iv n ~ n i/a+(2 S )vo). i t is 
easily seen that (Hi) of Assumption 4.1 is satisfied. Moreover, (i) of Assumption 4.1 holds 
true, since mm KKA f n Aj x exp(-M n ) x n .(io g iogn.)/(2a) > 2 / n and 



max 



- M- 1 "^)- exp(-M 2 *) x - - < (logn)/(2n). 



!^M„ m(w m )vi n n(loglogn)/(2a) 

Finally, consider (H) of Assumption 4.1 which is satisfied by using that for all p > s 

min , Am , x (m^)- 1 -^)- exp(-(r<) 2a ) ~ 1 g ^ rjr^- > 2(log n)/n, 

l^m^m' m(u m ) V i n ' v v ny 7 n (log log n)/ (2a) 

which completes the proof of Corollary 4.2. □ 
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